A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language
نویسندگان
چکیده
One of the important resources required for various Natural Language Processing (NLP) applications like machine translation, information retrieval and text mining, is annotated corpora. Text corpora annotation process requires parts speech (POS) tags to mark different with grammatical annotations in order identify linguistic properties a word, sentence or discourse. The marking items based on two main features 1) category 2) context (word, discourse) i.e. relationship adjacent related text. Saraiki being one oldest languages still resource scarce language recorded literature as well computational context. According our study, at present, there no tagset defined language. This work presents first hierarchical POS (MPOST) tag set which designed be used morphological, syntactic lexical
منابع مشابه
developing a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”
هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...
15 صفحه اولwuthering heights and the concept of marality/a sociological study of the novel
to discuss my point, i have collected quite a number of articles, anthologies, and books about "wuthering heights" applying various ideas and theories to this fantastic story. hence, i have come to believe that gadamer and jauss are rightful when they claim that "the individaul human mind is the center and origin of all meaning," 3 that reading literature is a reader-oriented activity, that it ...
15 صفحه اولa comparative pragmatic analysis of the speech act of “disagreement” across english and persian
the speech act of disagreement has been one of the speech acts that has received the least attention in the field of pragmatics. this study investigates the ways power relations, social distance, formality of the context, gender, and language proficiency (for efl learners) influence disagreement and politeness strategies. the participants of the study were 200 male and female native persian s...
15 صفحه اولA Common Parts-of-Speech Tagset Framework for Indian Languages
We present a universal Parts-of-Speech (POS) tagset framework covering most of the Indian languages (ILs) following the hierarchical and decomposable tagset schema. In spite of significant number of speakers, there is no workable POS tagset and tagger for most ILs, which serve as fundamental building blocks for NLP research. Existing IL POS tagsets are often designed for a specific language; th...
متن کاملeffects of first language on second language writing-a preliminary contrastive rhetoric study of farsi and english
to explore the idea the investingation proposed, aimed at finding whether the performances of the population of iranians students studying english in an efl context are consistent in l1 and l2 writing taks and whether there is a cross-linguistic transfer in this respect. in this regard the subjects were instructed to write four compositions-two in english and two in farsi-which consisted of an ...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Applied and Emerging Sciences
سال: 2021
ISSN: ['1814-070X', '2415-2633']
DOI: https://doi.org/10.36785/jaes.111459